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Dogma of Molecular Biology 


1. Introduction 


Thousands of genes are being discovered for the first time by sequencing 
the genomes of model organisms, a reminder that much of the natural world 
remains to be explored at the molecular level. DNA microarrays provide a 
natural vehicle for this exploration. The model organisms are the first for 
which comprehensive genome-wide surveys of gene expression patterns or 
function are possible. The results should be viewed as maps that reflect the 
order and logic of the genetic program, rather than the physical order of 
genes on chromosomes. Exploration of the genome using DNA microarrays 
and other genome-scale technologies should narrow the gap in our 
knowledge of gene function and molecular biology. 


2. Dogma of Molecular Biology 


Deoxyribonucleic acid (DNA) is the elementary template carrying essential 
genetic code for every living organism. In bacteria and other simple cell 
organisms, DNA is distributed more or less throughout the cell. In the 
complex cells that make up plants, animals and in other multi-cellular 
organisms, most of the DNA is found in the chromosomes, which are 
located in the cell nucleus. The energy-generating organelles known as 
chloroplasts and mitochondria also carry DNA, as do many viruses. Pieces 
of DNA are pairs of molecules, which entwine like vines to form a double 
helix. DNA strands are composed of four nucleotide subunits. These are 


hydrogen bonds readily to only one other -- A to T and C to G. the entire 
nucleotide sequence of each strand is complementary to that of the other, 
and when separated, each may act as a template with which to replicate the 
other. The information contained by the DNA strand allows for development 
and control of any processes taking place in living organism over its 
lifetime span, not only on the cellular, but also on the whole system level. 
The general structure of the DNA is depicted on the Figurel. 

The DNA structure. 


The DNA structure. 


In order to read the information contained in DNA, first, their functional 
units, genes are transcribed during transcription into messenger ribonucleic 
acid (mMRNA)), which is based on the complementary DNA strand. mRNA 
molecules serve as templates for the protein synthesis; they are transported 
to the cytoplasm and repeatedly read by the ribosomes. Before the mRNA is 
ready to be translated, it undergoes several processes i.e. splicing, which 
means that the pre-mRNA is modified to remove certain stretches of non- 
coding sequences called introns. The stretches that remain includ protein- 
coding sequences and are called exons. Finally, consecutive three 
nucleotide bases of the mRNA sequence are translated into corresponding 


amino acids and linked together to form protein chains. Proteins are 
required for the structure, function, and regulation of the cells, tissues and 
organs. Each protein has its unique functions. The process of reading 
content of a gene is depicted in Figure2. 


In order to understand the role and function of the genes one needs the 
complete information about their mRNA transcripts and proteins. 
Unfortunately, exploring the protein functions is very difficult due to their 
unique 3-dimentional complicated structure and a shortage of efficient 
technologies. To overcome this difficulty one may concentrate on the 
mRNA molecules produced by the genes of interest (gene expression) and 
use this information to investigate the functional roles of the genes. This 
idea was a motivation for the development of microarrays technique, as a 
method allowing for studying the interaction between thousands of genes 
based on their mRNA transcript level. 


The Central Dogma of Molecular Biology. 


Dogma of Molecular Biology 
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Gene Networks 


Gene Networks. 


A gene regulatory network (also called a GRN or genetic regulatory 
network,) is a collection of DNA segments in a cell which interact with 
each other and with other substances in the cell, thereby governing the rates 
at which genes are transcribed into MRNA. Genes can be viewed as nodes 
in such a network, with input being proteins such as transcription factors, 
and outputs being the level of gene expression. The node itself can also be 
viewed as a function which can be obtained by combining basic functions 
upon the inputs (in the Boolean network these are boolean functions or 
gates computed using the basic AND OR and NOT gates in electronics). 
These functions have been interpreted as performing a kind information 
processing within cell which determine cellular behaviour. The basic 
drivers within cells are levels of some proteins, which determine both 
Spatial (tissue related) and temporal (developmental stage) co-ordinates of 
the cell, as a kind of "cellular memory". The gene networks are only 
beginning to be understood, and it is a next step for biology to attempt to 
deduce the functions for each gene "node", to assist in modeling behaviour 
of a cell. Mathematical models of GRNs have been developed to allow 
predictions of the models to be tested. Various modeling techniques have 
been used, including boolean networks, Petri nets, Bayesian networks, and 
sets of differential equations. Conversely, techniques have been proposed 
for generating models of GRNs that best explain a set of time series 
observations. 


One gene can affect the expression of another gene by binding of the gene 
product of one gene to the promoter region of another gene. Looking at 
more than two genes, we refer to the regulatory network as the regulatory 
interactions between the genes. If we have a large number of measurements 
of the expression level of a number of genes, we should be able to model or 
reverse engineer the regulatory network that controls their expression level. 
The problem can be attacked in two fundamentally different ways: using 
time-series data and using steady-state data of gene knockout. 


GRNs act as analog biochemical computers to specify the identity and level 
of expression of groups of target genes. Central to this computation are 
DNA recognition sequences with which transcription factors associate. 
When active transcription factors associate with the promontory region of 
target genes, they can function to specifically repress (down-regulate) or 
induce (up-regulate) synthesis of the corresponding RNA. The immediate 
molecular output of a gene regulatory network is the constellation of RNAs 
and proteins encoded by network target genes. The resulting cellular 
outputs are changes in the structure, metabolic capacity, or behavior of the 
cell mediated by new expression of up-regulated proteins and elimination of 
down-regulated proteins. 


GRNs are remarkably diverse in their structure, but several basic properties 
are illustrated in the figure below (Figure1.) . In this example, two different 
signals converge on a single target gene where the cis-regulatory elements 
provide for an integrated output in response to the two inputs. Signal 
molecule A triggers the conversion of inactive transcription factor A (green 
Oval) into an active form that binds directly to the target gene's cis- 
regulatory sequence. The process for signal B is more complex. Signal B 
triggers the separation of inactive B (red oval) from an inhibitory factor 
(yellow rectangle). B is then free to form an active complex that binds to 
the active A transcription factor on the cis-regulatory sequence. The net 
output is expression of the target gene at a level determined by the action of 
factors A and B. In this way, cis-regulatory DNA sequences, together with 
the proteins that assemble on them, integrate information from multiple 
signaling inputs to produce an appropriately regulated readout. A more 
realistic network might contain multiple target genes regulated by signal A 
alone, others by signal B alone, and still others by the pair of A and B. Co- 
regulated target genes often code for proteins that act together to build a 
specific cell structure or to effect a concerted change in cell function. For 
example, genes encoding components of the multiprotein proteasome 
machine (see The Machines of Life) are co-regulated at the RNA level. This 
was shown by microarray gene chip analyses in yeast cells, and each gene 
was found to possess a similar cis-regulatory DNA sequence that mediates 
binding of a particular transcription factor. Similarly, a bacterium may 
respond to a shortage of its preferred energy source by activating expression 


of genes whose protein products function in a biochemical pathway that 
allows it to use a different, more abundant source of energy. 
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The gene regulatory network. 


Note: Boolean Networks 


Note: Probabilistic Boolean Networks 


Note : Bayesian Networks 


cCDNA-Basic Concept 


cDNA-Basic Concept 


[link] Recently, several types of the DNA microarrays were introduced. 
Applications of microarrays range from the study of gene expression in 
yeast( Lashkari et al., 1997) under different environmental stress conditions 
to the comparison of gene expression profiles for tumors from cancer 
patients (Golub et al., 1999). The first approach is to use the chemically 
synthesized form of DNA called COMPLEMENTARY DNA (cDNA), 
which contains only coding part of the sequence, complementary to its 
corresponding mRNA transcript. Microarrays have a form of microscope 
slides containing hundreds to thousands of immobilized DNA samples that 
are hybridized in a manner very similar to the Northern (Alwine et al., 
1977)and Southern blot (Southern, 1975). The main function of a 
microarray is to detect the level of mRNA transcript of genes of interest. 
The plates are incubated in the solution containing genetic material under 
consideration. The mRNA transcripts floating in the solution would 
hybridize to their complementary cDNA, previously placed on the 
microarray chip. Since the cDNA on the chip is fluorescently labeled, every 
spot will emit a light in the ultraviolet environment, intensity of which 
depends on the amount of hybridized mRNA (Schena et al., 1995). The 
differentiation of the cDNA’s ultraviolet dye allows the comparison of the 
gene expression under different experimental conditions (case- control 
studies). The preparation of the microarray for case-control study is 
schematically depicted on Figure 1. Initial data obtained from DNA 
microarrays are in the form of scanned images. Coding the gene expression 
by means of colors can be helpful for building d genetic maps and graphical 
data processing. Expression gene map is presented in the form of a table; 
the rows of which corresponds to the consecutive genes and columns 
represent different samples, for example under multiple experimental 
conditions or for different patients. More informations available at: 
Bioconductors, follow link to training . 

The spotted array technology. 
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Overview of Procedures for Preparing and Analyzing 
Microarrays of Complementary DNA (cDNA). As 
shown in Panel A, reference RNA and tumor RNA is 
labeled by reverse transcription with different fluorescent 
dyes (green for the reference cells and red for the tumor 
cells) and hybridized to a cDNA microarray containing 
robotically printed cDNA clones. As shown in Panel B, 
the slides are scanned with a confocal laser-scanning 
microscope, and color images are generated for each 
hybridization with RNA from the tumor and reference 
cells. Genes up-regulated in the tumors appear red, 
whereas those with decreased expression appear green. 
Genes with similar levels of expression in the two 
samples appear yellow. Genes of interest are selected on 
the basis of the differences in the level of expression by 
known tumor classes (e.g., BRCA1-mutation—positive 
and BRCA2-mutation—positive). Statistical analysis 
determines whether these differences in the gene- 
expression profiles are greater than would be expected 
by chance. As shown in Panel C, the differences in the 
patterns of gene expression between tumor classes can be 
portrayed in the form of a color-coded plot, and the 
relations between tumors can be portrayed in the form of 
a multidimensional-scaling plot. Tumors with similar 


eo 300 


gene-expression profiles cluster close to one another in 
the multidimensional-scaling plot. 


Note: cDNA arrays - detailed informations. 


Note: Oligonucleotide arrays. 


cCDNA-Detailed Information 


Detailed information on the cDNA technology 


To prepare microarrays, glass or nylon micro plates are used, onto which 
thousands of single stranded pieces of DNA of length of tens of nucleotides 
are placed (Cheung et al.,1999). Each spot on the plate corresponds to a 
particular gene. The special computer—controlled three—axis robots generate 
high-density, gridded arrays of cDNA. Figure 1 a, _b,_c presents an example 
of a workstation for producing a microarrays. Figure 1 d depicts a scanner 
used for reading a microarray with genetic material introduced during the 
course of an experiment. 

The microarray robot. 


. La, The Pennsylvania University’s microarray robot 
(Cheung et al.,1999). The X-, Y-, Z- axes are labesled 
1, 2, and 3,respectively. The key component of the 
arrayer is the print-head, containing pens (4). 
Microscope glass slides are placed on the slide station 
(5). Samples are prepared and arrayed from 96-well 


sample plates (6). The pins are cleaned between 
sample acquisitions at the washing (7) and drying (8) 
stations. b, AECOM microarray robot. The table 
configuration shown contains 160 slides with four 
microtitre plates, two wash stations and the dryer. The 
print-head (c) shows four of the possible twelve pen 
tips in use. d, AECOM laser scanner. Visible are the 
optical table, power supplies for lasers and PMT 
cooling, the Ludl stage, and lasers. The 20’microscope 
objective is inside the ludlstage while lenses, mirrors 
and otheroptics are enclosed in the metal casing. 
PMTs are to the right and outside the photo. 


In a single reaction, two different probes can be labelled with different 
colors, and simultaneously incubated with a microarray. Robots (arrayers) 
are required to place (or array) a large number of probes onto slides. The 
AECOM arrayer generates high-density, gridded arrays of cDNA, genomic 
DNA or similar biological material on glass surfaces. Its principal 
components are a computer—controlled three—axis robot and a unique pen 
tip assembly. The wash stations are stationary basins containing distilled 
water that is replaced after every two-microtitre plate. When the pen tips are 
immersed, the robot shakes the pen assembly back and forth to enhance 
cleaning. A computer—controlled water bath sonicator and/or flowing water 
bath could be substituted. The dryer is essentially a computer—controlled 
wet/dry vacuum cleaner and an adapter fitted with restricting inlet holes 
into which the pen tips are inserted. Drying is accomplished by the rapid 
airflow around the tips and the partial vacuum this creates. 


After DNA samples are arrayed onto slides, they are air-dried. The samples 
are immobilized by ultraviolet (UV)-irradiation to form covalent bonds 
between the thymidine residues in the DNA and the positively charged 
amine groups on the silane slides. After crosslinking, excess DNA 
molecules are removed by washing the arrays at room temperature and 


arrayed samples are denatured in water before hybridization. There are 
many methods for hybridizing targets and probes; they differ with respect to 
the solvents and temperatures used. TheFigure 2 presents the typical 
process of the nucleic acid hybridization. 


Once extracted from the two populations, the RNA samples are typically 
labeled with fluorescent dyes in order to generate probes. The commercial 
cyanine dyes Cy3 and Cy5 are commonly used in labeling reactions. 
Fluorescently labeled probes can be prepared by several different methods 
including direct or indirect cDNA labeling, (Hegde et al., 2000; Richteret 
al., 2002; Van Gelder et al., 1990). After cDNA synthesis, a fluorescent 
cascade molecule with hundreds of dye molecules per complex is 
hybridized to the cDNA. The labeled probes prepared from the two RNA 
sources are co-hybridized to the same DNA chip. Important parameters 
include hybridization temperature, length of hybridization, concentration of 
Salts, pH of the solution, and the presence or absence of denaturants such as 
formaldehyde in the hybridization buffer. The hybridized array is typically 
scanned with a system that uses lasers as a source of excitation light and 
photomultiplier tubes as detectors. This system is capable of differentiating 
the fluorescently labeled probes. 


The nucleic acid hybridization. 
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The nucleic acid hybridization. 


Note: Oligonucleotide arrays. 


Note: cDNA arrays - basic Concepts 


Affymetrix Chip-Basic Concepts 
Oligonucleotide arrays. 


4.2.1.Overview 


The oligonucleotide arrays, developed by the Affymetrix Company , are a 
new approach in microarray technology, based on hybridization to small, 
high-density arrays containing tens of thousands of synthetic 
oligonucleotides. The arrays are designed based on sequence information 
alone and are synthesized in situ using a combination of photolithography 
and oligonucleotide chemistry. RNAs present at a frequency of 1:300,000 
are unambiguously detected, and detection is quantitative over more than 
three orders of magnitude. This approach provides a way to use directly the 
growing body of sequence information for highly parallel experimental 
investigations. Because of the combinatorial nature of the chemistry and the 
ability to synthesize small arrays containing hundreds of thousands of 
specifically chosen oligonucleotides, the method is readily scalable to the 
simultaneous monitoring of tens of thousands of genes. The Affymetrix 
integrated GeneChip arrays contain up to 500,000 unique probes 
corresponding to tens of thousands of gene expression measurements. 


Affymetrix manufactures arrays monitor the global activities of genes in 
yeast, Arabidopsis, Drosophila, mice, rats, and humans. In addition, custom 
expression arrays can be designed for other model organisms, proprietary 
sequences, or specific subsets of known genes. For human arrays, expressed 
sequences from databases are collected and clustered into groups of similar 
sequences. Using clusters as a starting point, sequences are further 
subdivided into subclusters representing distinct transcripts. This 
categorization process involves alignment to the human genome, which 
reveals splicing and polyadenylation variants. 


Oligonucleotide chips. 
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A typical experiment with an oligonucleotide chip; 
preparation of sample for GeneChip arrays. 
Messenger RNA (mRNA) is extracted from the 
cell and converted to cDNA. It then undergoes 
amplification and labeling step before 
fragmentation and hybridization to 25-mer oligos 
on the surface to the chip. After washing of 
unhybridized material, the chip is scanned in a 
confocal laser scanner and the image analyzed by 
computer. 


Note: Oligonucleotide arrays - detailed informations 


Note: cdna arrays 


Oligonucleotide Arrays-Detailed Information 


Detailed Information on the Oligonucleotide Arrays. 


A core element of array design, the Perfect Match/Mismatch probe strategy 
, is universally applied to the production of GeneChip arrays. For each 
probe designed to be perfectly complementary to a target sequence, a 
partner probe is generated that is identical except for a single base 
mismatch in its center. These probe pairs, called the Perfect Match probe 
(PM) and the Mismatch probe (MM), allows the quantization and 
subtraction of signals caused by non-specific cross-hybridization(further 
web presentation). The difference in hybridization signals between the 
partners, as well as their intensity ratios, serves as indicators of specific 
target abundance. 

GeneChip Expression Array Design 
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The Affymetrix GeneChip technology. 
There may be 5,000-20,000 probe sets per 
chip. The presence of messenger RNA 
(mRNA) is detected by a series of probe 
pairs that differ in only one nucleotide. 
Hybridization of fluorescent mRNA to 
these probes pairs on the chip is detected 
by laser scanning of the chip surface. A 
probe set = 11-20 PM, MM pairs. 


Probe synthesis occurs in parallel, resulting in the addition of an A, C, T, or 
G nucleotide to multiple growing chains simultaneously. To define which 
oligonucleotide chains will receive a nucleotide in each step, 
photolithographic masks, carrying 18 to 20 square micron windows that 
correspond to the dimensions of individual features, are placed over the 
coated wafer. The windows are distributed over the mask based on the 
desired sequence of each probe. When ultraviolet light is shone over the 
mask in the first step of synthesis, the exposed linkers become deprotected 
and are available for nucleotide coupling. Critical to this step is the precise 
alignment of the mask with the wafer before each synthesis step. The 
nucleotide attaches to the activated linkers, initiating the synthesis process. 
In the following synthesis step, another mask is placed over the wafer to 
allow the next round of deprotection and coupling. The process is repeated 
until the probes reach their full length, usually 25 nucleotides. 
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Using technologies adapted from the 
semiconductor industry, GeneChip 
manufacturing begins with a 5-inch square 
quartz wafer Affymetrix . Initially the quartz 
is washed to ensure uniform hydroxylation 
across its surface. The wafer is placed in a 
bath of silane, which reacts with the 


hydroxyl groups of the quartz, and forms a 
matrix of covalently linked molecules. Each 
of these features harbors millions of 
identical DNA molecules. The silane film 
provides a uniform hydroxyl density to 
initiate probe assembly. Linker molecules, 
attached to the silane matrix, provide a 
surface that may be spatially activated by 
light. 


Once the synthesis is completed, the wafers are deprotected, diced, and the 
resulting individual arrays are packaged in flow cell cartridges. Depending 
on the number of probe features per array, a single wafer can yield between 
A9 and 400 arrays. The manufacturing process ends with a comprehensive 
series of quality control tests. 


The design and manufacture of GeneChip probe arrays are highly 
stereotyped and consistent, eliminating the need to make arrays in 
individual labs, thereby, significantly minimizing user setup time, and 
providing a higher degree of reproducibility between experiments. Taking 
advantage of these capabilities, researchers have used GeneChip probe 
arrays to study the regulation of gene expression associated with a wide 
variety of basic biological functions, including development, hormonal 
signaling, and circadian rhythms. Also, many studies have used GeneChip 
probe arrays to tackle disease. A rapidly growing area of application is 
cancer research, for instance, in which arrays have helped researchers 
discover new tumor Classes, assign patient samples to known tumor classes, 
reveal cancer-related alterations in molecular pathways, predict clinical 
outcomes, and identify new drug targets( Shipp et al.,2002; Pomeroy et al., 
2002; Schadt et al., 2001; Golub et al., 1999; Lockhart et al., 1996). 


Standard eukaryotic gene expression assay 
Affymetrix . The basic concept behind the 
use of GeneChip arrays for gene expression 
is simple: labeled cDNA or cRNA targets 
derived from the mRNA of an 
experimental sample are hybridized to 
nucleic acid probes attached to the solid 
support. By monitoring the amount of label 
associated with each DNA location, it is 
possible to infer the abundance of each 
mRNA species represented. Although 
hybridization has been used for decades to 
detect and quantify nucleic acids, the 
combination of the miniaturization of the 
technology and the large and growing 
amounts of sequence information, have 
enormously expanded the scale at which 
gene expression can be studied. 


Note: data analysis 


Note: cdna arrays 


Data Analysis 


Data Analysis. 


After scanning, a grid must be placed on the image and the spots 
representing the arrayed genes must be identified. The background 
fluorescence is calculated locally for each spot and is subtracted from the 
hybridization intensities. Comparing the fluorescence intensity of control 
identifies differentially expressed genes and experimental probes hybridized 
to each spot, (Freeman et al.,2000; Bowtell, 1999; Knudsen, 2002) 
Typically, the experimental target sequences are labeled with Cy5, which 
fluoresces red light (667 nm), and control targets are labeled with Cy3, 
which fluoresces green light (568 nm). The ratio of red to green signal can 
then be used as a measure of the effect of the experimental treatment on the 
expression of each gene. A ratio of 1 (yellow spot) indicates no change in 
the expression level between experimental and control samples, while a 
ratio greater than 1 (red spot) indicates increased transcription in the 
experimental sample, and a ratio less than 1 (green spot) indicates 
decreased transcription in the experimental sample. A scatter plot is a very 
useful representation of the expression data; the signal intensities of the 
experimental and control samples are plotted along the x- and y-axes, and 
the ratio values are plotted as a distance from the diagonal, (Schena, 2003). 
The diagonal separates spots with higher activity than the control sample 
from spots with lower activity than the control. The scatter plot provides a 
visualization of the fluorescence ratios obtained from the experimental and 
control samples. One can then easily choose points that represent a several 
fold increase or decrease in gene expression and focus additional analyses 
on these genes. 


The hybridized microarray. 


A hybridized microarray printed by the AECOM 
robot (Cheung et al.,1999). A 5550-gene mouse 
cDNA microarray was printed and hybridized to 
Cye3-dUTP and Cye5-dUTP probes from wild- 
type and mutant mouse cell lines and imaged using 
the AECOM laser scanner. Shown is one out off 
our of the pen tip printing areas region of the array. 


With just one experimental condition and a control, the data analysis is 
limited to a list of regulated genes ranked by the fold-change or by the 
significance of the change determined in a t test. Normalization of data 
must be performed to compare separate arrays. With multiple experimental 
conditions (e.g. time-points or drug doses), the genes are often grouped into 
clusters that behave similarly under the different conditions. Complex 
computational methods such as hierarchical clustering or k-means are used 
to analyze the massive amounts of data generated by these experiments. 
Gene clusters are visualized with trees or color-coded matrices by placing 
genes with similar patterns of expression into a clustered group Figurel11. 
Image processing and analysis software is commercially available, and 
several packages are available as freeware: 

http://www.tigr.org/software/, 
http://research.nhgri.nih.gov/microarray/main.html, 
http://www.bio.davidson.edu/projects/magic/magic. html. 


Clustering of gene expression patterns. 
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Clustering of gene expression 
patterns. a, the ratio of gene 
expression in control relative to 
experimental for individual genes is 
displayed using a color scale. Black 
indicates no change in expression, 
while an increase in the experimental 
relative to the control is shown as 
red, and a decrease in the 
experimental relative to the control is 
shown as green. Genes displaying 
similar patterns of induction or 
repression are clustered together. b, 
clustering of thousands of genes by 
patterns of gene induction or 
repression following a treatment, 
(Campbell and Heyer, 2003). 


Microarray analysis of gene expression does have limitations that 
researchers must consider. In gene expression, the correlation between 
induced mRNA and induced levels of protein are not always well aligned. 
Translational and post-translational regulatory mechanisms that impact the 
activity of various cellular proteins are not examined by DNA microarrays, 
though the emerging field of proteomics is beginning to address this issue. 
Other limitations of microarray analysis include the impact of alternative 
splicing during transcript processing and the limited detectability of 
unstable mRNAs. Differential gene expression results must be confirmed 
through direct examination of selected genes. These analyses are typically 
at the level of RNA blot or quantitative RT-PCR to examine transcripts of a 
specific gene, and/or detection of protein concentration using immunoblots. 
Additional studies often include alteration of gene function with targeted 
mutations, antisense technology, or protein inhibition. 


Note: cdna arrays 


Note: Oligonucleotide arrays 


Note: Gene Networks 


Boolean Networks 


Introduction 


A central goal of molecular biology is to understand the regulation of 
protein synthesis and its reactions to external and internal signals. All the 
cells in an organism carry the same genomic data, yet their protein makeup 
can be drastically different both temporally and spatially, due to regulation. 
Protein synthesis is regulated by many mechanisms at its different stages. 
These include mechanisms for controlling transcription initiation, RNA 
splicing, mRNA transport, translation initiation, post-translational 
modifications, and degradation of mRNA/protein. One of the main 
junctions at which regulation occurs is mRNA transcription. A major role in 
this machinery is played by proteins themselves that bind to regulatory 
regions along the DNA, greatly affecting the transcription of the genes they 
regulate. In recent years, technical breakthroughs in spotting hybridization 
probes and advances in genome sequencing efforts lead to development of 
DNA microarrays, which consist of many species of probes, either 
oligonucleotides or cDNA, that are immobilized in a predefined 
organization to a solid phase. By using DNA microarrays, researchers are 
now able to measure the abundance of thousands of mRNA targets 
simultaneously ( DeRisi et al.,1997; Lockhart et al., 1996; Wen et al., 
1998). Unlike classical experiments, where the expression levels of only a 
few genes were reported, DNA microarray experiments can measure all the 
genes of an organism, providing a “genomic” viewpoint on gene 
expression. As a consequence, this technology facilitates new experimental 
approaches for understanding gene expression and regulation (Iyer et al., 
1999; Spellman et al., 1998). 


A central focus of genomic research concerns understanding the manner in 
which cells execute and control the enormous number of operations 
required for their function. Biological systems behave in an exceedingly 
parallel and extraordinarily integrated fashion. Feedback and damping are 
routine even for the most common activities. Thus, in this area of genomic 
biology, single gene perspectives are becoming increasingly limited for 
gaining insight into biological processes. Network applications are 
becoming increasingly important for making progress in our understanding 


of the manner in which genes and molecules collectively form a biological 
system and harnessing this understanding in educated intervention for 
correcting human diseases. Such approaches inevitably require 
computational and formal methods to process massive amounts of data, 
understand general principles governing the system under study, and make 
useful predictions about system behavior in the presence of known 
conditions. There is a rather wide spectrum of approaches for modeling 
gene regulatory networks, each with its own assumptions, data 
requirements, and goals. The group of the most popular models includes: 
Boolean, Probabilistic Boolean and Bayesian networks. 


Boolean Networks 


The Boolean network model, introduced by Kauffman (Kauffman, 1969, 
1974; Kauffman and Glass, 1973)and recently developed by 
Shmulevich(Shmulevich, 2002), has received the most attention, not only 
from the biology community, but also in physics. In this model, gene 
expression is quantized to only two levels: ON and OFF. The expression 
level (state) of each gene is functionally related to the expression states of 
some other genes, using logical rules. A Boolean network G(V,F) is defined 
by a set of nodes corresponding to genes V = {x1,..., xn} and a list of 
Boolean functions F = (f1,..., fm). The state of a node (gene) is 
completely determined by the values of other nodes at time t by means of 
underlying logical Boolean functions. The model is represented in the form 
of directed graph. Each xi represents the state (expression) of gene i, where 
xi=1 represents the fact that gene i is expressed and xi=0 means it is not 
expressed. The list of Boolean functions F represents the rules of regulatory 
interactions between genes. That is, any given gene transforms its inputs 
(regulatory factors that bind to it) into an output, which is the state or 
expression of the gene itself. The maximum connectivity of a Boolean 
network is defined by K= maxi (ki). All genes are assumed to update 
synchronously in accordance with the functions assigned to them and this 
process is then repeated. The artificial synchrony simplifies computation 
while preserving the qualitative, generic properties of global network 
dynamics (Kauffman, 1993; Huang, 1999; Wuensche, 1998). 


Below the example is presented. Consider a Boolean network consisting of 
5 genes {x1,..., x5} with the corresponding Boolean functions given by 
the truth tables shown in Figurel. The maximum connectivity is K=3, 
although we allow some input variables to duplicate, essentially reducing 
the connectivity. The dynamics of this Boolean network are shown in 
Figure2. Since there are 5 genes, there are 2/5 = 32 possible states that the 
network can be in. Each state is represented by a circle and the arrows 
between states show the transitions of the network according to the 
functions in Table 1., Figurel.. It is easy to see that because of the inherent 
deterministic directionality in Boolean networks as well as only a finite 
number of possible states. 
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functions in a Boolean 
network with 5 genes. 
The indices j1, j2, and 
j3 indicate the input 
connections for each of 
the functions. 
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The state-transition diagram for the Boolean network defined in 
table 1.(Figure1). 


In the context of Boolean networks as models of genetic regulatory 
networks, there is no doubt that the binary approximation of gene 
expression is an oversimplification (Huang, 1999). However, even though 
most biological phenomena manifest themselves in the continuous domain, 
they are often described in a binary logical language such as ‘on and off,’ 
‘upregulated and downregulated’, and ‘responsive and nonresponsive.’ 
There is a several examples showing that a Boolean formalism is 
meaningful in biology, in (Shmulevich and Zhang, 2002), one reasoned that 
if the genes, when quantized to only two levels (1 or 0), would not be 
informative in separating known sub-classes of tumors, then there would be 
little hope for Boolean modeling of realistic genetic networks based on gene 
expression data. 


Fortunately, the results were very promising. By using binary gene 
expression data, generated via CDNA microarrays, and the Hamming 
distance as a similarity metric, a clear separation between different sub- 
types of gliomas as well as between different sarcomas was showed. This 
seems to suggest that a good deal of meaningful biological information, to 


the extent that it is contained in the measured continuous-domain gene 
expression data, is retained when it is binarized. 


Biological Example 


Below an example id presented, borrowed from (Shmulevich et al., 2002), 
showing the logical representation of cell cycle regulation. This process of 
cellular growth and division is highly regulated. A disbalance in this 
process results in unregulated cell growth in diseases such as cancer. In 
order for cells to move from the G1 phase to the S phase, when the genetic 
material, DNA, is replicated for the daughter cells, a series of molecules 
such as cyclin E and cyclin dependent kinase 2 (cdk2) work together to 
phosphorylate the retinoblastoma (Rb) protein and inactivate it, thus 
releasing cells into the S phase. Cdk2/cyclin E is regulated by two switches: 
the positive switch complex called cdk activating kinase (CAK) and the 
negative switch p21/WAF1. The CAK complex can be composed of two 
gene products: cyclin H and cdk7. When cyclin H and cdk7 are present, the 
complex can activate cdk2/cyclin E. A negative regulator of cdk2/cyclin E 
is p21/WAF1, which in turn can be activated by p53. When p21/WAF1 
binds to cdk2/cyclin E, the kinase complex is turned off (Gartel and Tyner, 
1999). Further, p53 can inhibit cyclin H, a positive regulator of cyclin 
E/cdk2 (Schneider et al., 1998). This negative regulation is an important 
defensive system in the cells. For example, when cells are exposed to 
mutagen, DNA damage occurs. It is to the benefit of cells to repair the 
damage before DNA replication so that the damaged genetic materials do 
not pass onto the next generation. Extensive amount of work has 
demonstrated that DNA damage triggers switches that turn on p53, which 
then turns on p21/WAF1. p21/WAF1 then inhibits cdk2/cyclin E, thus Rb 
becomes activated and DNA synthesis stops. As an extra measure, p53 also 
inhibits cyclin H, thus turning off the switch that turns on cdk2/cyclin E. 
Such delicate genetic switch networks in the cells are the basis for cellular 
homeostasis — the ability of an organism to maintain equilibrium. 


For purposes of illustration, let consider a simplified diagram, shown in 
Figure3, illustrating the effects of cdk7/cyclin H, cdk2/cyclin E, and 
p21/WAF1 on Rb. Thus, p53 and other known regulatory factors are not 
considered. While this diagram represents the above relationships from a 


pathway perspective, one may also represent the activity of Rb in terms of 
the other variables in a logic-based fashion. Figure4 contains a logic circuit 
diagram of the activity of Rb (‘on’ or ‘off’) as a Boolean function of four 
input variables: cdk7, cyclin H, cyclin E, and p21/WAF1. Note that cdk2 is 
shown to be completely determined by the values of cdk7 and cyclin H 
using the AND operation and thus, cdk2 is not an independent input 
variable. Also, in Figure3, p21/WAF1 is shown to have an inhibitive effect 
on the cdk2/cyclin E complex, which in turn regulates Rb, while in Figure4, 
we see that from a logic-based perspective, the value of p21/WAF1 works 
together with cdk2 and cyclin E to determine the value of Rb. 


CAK 


A diagram illustrating the cell cycle regulation 
example. Arrowed lines represent activation and 
lines with bars at the end represent inhibition. 
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The logic diagram describing the activity of 
retinoblastoma (Rb) protein in terms of 4 inputs: 
cdk7, cyclin H, cyclin E, and p21. The gate with 

inputs cdk7 and cyclin H is an AND gate, the gate 
with input p21/WAF1 is a NOT gate, and the gate 
whose output is Rb is a NAND (negated AND) 
gate. 


Note: Probabilistic Boolean Networks 


Note: Bayesian Networks 


Probabilistic Boolean and Bayesian Networks 


Probabilistic Boolean Networks 


In a Boolean network, each (target) gene is ‘predicted’ by several other 
genes by means of a Boolean function (predictor). Thus, after having 
inferred such a function from gene expression data, it could be concluded 
that if we observe the values of the predictive genes, we know, with full 
certainty, the value of the target gene. Conceptually, such an inherent 
determinism seems problematic as it assumes an environment with no 
uncertainty. However, the data that used for the inference exhibits 
uncertainty on several levels. 


Another class model called Probabilistic Boolean Networks (PBNs) 
(Shmulevich et al., 2002) shares the appealing properties of Boolean 
networks, but is able to cope with uncertainty, both in the data and the 
model selection. A model incorporates only a partial description of a 
physical system. This means that a Boolean function giving the next state of 
a variable is likely to be only partially accurate. 


The basic idea is to extend the Boolean network to accommodate more than 
one possible function for each node. Thus, to every node xi. , their 
corresponds a set Fi={ fj },j=1,..., 1(i), Where each fj is a possible function 
determining the value of gene xi and I(i) is the number of possible functions 
for gene xi. A realization of the PBN at a given instant of time is 
determined by a vector of Boolean functions, where the ith element of that 
vector contains the predictor selected at that instant for gene xi. In other 
words, the vector function fk:{0,1}4n mapps to {0,1}/n acts as a transition 
function (mapping) representing a possible realization of the entire PBN. 
Such functions are commonly referred to as multiple-output Boolean 
functions Each of the N possible realizations can be thought of as a standard 
Boolean network operates for one time step. In other words, at every state 
x(t) belongs to {0,1}4n, one of the N Boolean networks is chosen and used 
to make the transition to the next state x(t+1) belongs to {0,1}4n . The 
probability Pi that the ith (Boolean) network or realization is selected can 
be easily expressed in terms of the individual selection probabilities Cj see 
(Shmulevich et al., 2002). The dynamics of the PBN are essentially the 


same as for Boolean networks, but at any given point in time, the value of 
each node is determined by one of the possible predictors, chosen according 
to its corresponding probability.This can be interpreted by saying that at any 
point in time, we have one out of N possible networks. The basic building 
block of a PBN is shown in the Figurel. 

AN EXAMPLE 


A basic building block of a probabilistic 
Boolean network. A number of predictors 
share common inputs while their outputs are 
synthesized, in this case by random 
selection, into a single output. This type of 
structure is known as a synthesis filter bank 
in digital signal processing literature. The 
wiring diagram for the entire PBN would 
consist of n such building blocks. Although 
the ‘wiring’ of the inputs to each function is 
shown to be quite general, in practice, each 
function (predictor) has only a few input 
variables. 


Bayesian Networks 


The well-studied statistical tool, Bayesian networks (Friedman et al.,2000; 
Pearl, 1988), represent the dependence structure between multiple 
interacting quantities (e.g., expression levels of different genes). Bayesian 
networks are a promising tool for analyzing gene expression patterns. First, 
they are particularly useful for describing processes composed of locally 
interacting components; that is, the value of each component directly 
depends on the values of a relatively small number of components. Second, 
statistical foundations for learning Bayesian networks from observations, 
and computational algorithms to do so, are well understood and have been 
used successfully in many applications. Finally, Bayesian networks provide 
models of causal influence: Although Bayesian networks are 
mathematically defined strictly in terms of probabilities and conditional 
independence statements, a connection can be made between this 
characterization and the notion of direct causal influence. (Heckermanet al., 
1999; Pearl and Verma, 1991; Spirtes et al.,1993). Although this connection 
depends on several assumptions that do not necessarily hold in gene 
expression data, the conclusions of Bayesian network analysis might be 
indicative of some causal connections in the data. 


A Bayesian network (also known as causal probabilistic networks) is an 
annotated directed acyclic graph that encodes a joint probability distribution 
of a set of random variables X. Formally, a Bayesian network for X is a pair 
B=(G,Q). The first component, G, is a directed acyclic graph (DAG) whose 
vertices correspond to the random variables x1, ..., xn, and whose edges 
represent direct dependencies between the variables. The graph G encodes 
the following set of independence statements: each variable xi is 
independent of its nondescendants given its parents G. The second 
component of the pair, namely Q, represents the set of parameters that 
quantifies the network and describes a conditional distribution for each 
variable, given its parents in G. Together, these two components specify a 
unique distribution on x1,..., xn. The graph G represents conditional 
independence assumptions that allow the joint distribution to be 
decomposed, economizing on the number of parameters. The graph G 
encodes the Markov Assumption: (Each variable Xi is independent of its 
nondescendants, given its parents in G. Given a Bayesian network, we 


might want to answer many types of questions that involve the joint 
probability (e.g., what is the probability of X = x given observation of some 
of the other variables?) or independencies in the domain (e.g., are X and Y 
independent once we observe Z?). The literature contains a suite of 
algorithms that can answer such queries efficiently by exploiting the 
explicit representation of structure (Jensen, 1996; Pearl, 1988). 


Biological Example 


Let apply the approach to the data of Spellman,(Spellman et al., 1998). This 
data set contains 76 gene expression measurements of the MRNA levels of 
6177 S. cerevisiae ORFs. These experiments measure six time series under 
different cell cycle synchronization methods. Spellman et al., (1998) 
identified 800 genes whose expression varied over the different cell-cycle 
stages. In learning from this data, one treat each measurement as an 
independent sample from a distribution and do not take into account the 
temporal aspect of the measurement. Since it is clear that the cell cycle 
process is of a temporal nature, compensatation is done by introducing an 
additional variable denoting the cell cycle phase. This variable is forced to 
be a root in all the networks learned. Its presence allows one to model 
dependency of expression levels on the current cell cycle phase.3 Two 
experiments were performed, one with the discrete multinomial 
distribution, the other with the linear Gaussian distribution. The learned 
features show that we can recover intricate structure even from such small 
data sets. It is important to note that a learning algorithm uses no prior 
biological knowledge nor constraints. All learned networks and relations are 
based solely on the information conveyed in the measurements themselves. 
These results are available at the following web page: 

the graphical display of some results from this analysis. 

SVS1 Gene Interaction Network 


The graph shows a local Bayesian network for the 
gene SVS1. The width (and color) of edges 
corresponds to the computed con. dence level. An 
edge is directed if there is a suf. ciently high con. 
dence in the order between the genes connected by 
the edge. This local map shows that CLN2 separates 
SVS1 from several other genes. Although there is a 
strong connection between CLN2 to all these genes, 
there are no other edges connecting them. This 
indicates that, with high con. dence, these genes are 
conditionally independent given the expression 
level of CLN2. 


Note: Boolean Networks 


Glossary 


Alphabet 


Glossary 


ADENINE 
One of the four bases in DNA that make up the letters ATGC, adenine 
is the "A". The others are guanine, cytosine, and thymine. Adenine 
always pairs with thymine. (from National Human Genome Research 
Institute) 


ALA (ALANINE)(A) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


AMINO ACIDS 
A group of 20 different kinds of small molecules that link together in 
long chains to form proteins. Often referred to as the "building blocks 
of proteins. (from National Human Genome Research Institute) 


ANAPHASE 
the stage of meiosis or mitosis when chromosomes move toward 
opposite ends of the nuclear spindle. (from WordNet) 


ARG (ARGININE)(R) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


ASN (ASPARAGINE)(N) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


ASP (ASPARTIC ACID)(D) 


One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


ASX (ASPARAGINE OR ASPARTIC ACID)(B) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


BASE PAIR 
Two bases which form a "rung of the DNA ladder." A DNA nucleotide 
is made of a molecule of sugar, a molecule of phosphoric acid, and a 
molecule called a base. The bases are the "letters" that spell out the 
genetic code. In DNA, the code letters are A, T, G, and C, which stand 
for the chemicals adenine, thymine, guanine, and cytosine, 
respectively. In base pairing, adenine always pairs with thymine, and 
guanine always pairs with cytosine. (from National Human Genome 
Research Institute) 


CENTROMERE 
The constricted region near the center of a human chromosome, This is 
the region of the chromosome where the two sister chromatids are 
joined to one another. (from National Human Genome Research 
Institute) 


CODON 
Three bases in a DNA or RNA sequence which specify a single amino 
acid. (from National Human Genome Research Institute) 


COMPLEMENTARY DNA (CDNA) 
a single-stranded DNA synthesized from a mature mRNA template. 
cDNA is often used to clone eukaryotic genes in prokaryotes. (from 
Wikipedia) 


CHROMATID 
one of two identical strands into which a chromosome splits during 
mitosis. (from WordNet) 


CHROMATIN (CHROMATIN GRANULE) 


the readily stainable substance of a cell nucleus consisting of DNA and 
RNA and various proteins; during mitotic division the chromatin 
condenses into chromosomes(from WordNet) 


CHROMOSOME 
One of the threadlike "packages" of genes and other DNA in the 
nucleus of a cell. Different kinds of organisms have different numbers 
of chromosomes. Humans have 23 pairs of chromosomes, 46 in all: 44 
autosomes and two sex chromosomes. Each parent contributes one 
chromosome to each pair, so children get half of their chromosomes 
from their mothers and half from their fathers. (from National Human 
Genome Research Institute) 


CLUSTER (GENE CLUSTER) 
A set of closely related genes that code for the same or similar proteins 
and which are usually grouped together on the same chromosome. 
(from BioTech Dictionary) 


COVALENT BOND 
A bond between two or more atoms that is provided by electrons that 
travel between the atoms' nuclei, holding them together but keeping 
them a stable distance apart. (from BioTech Dictionary) 


CROSSING OVER 
The breaking during meiosis of one maternal and one paternal 
chromosome, the exchange of corresponding sections of DNA, and the 
rejoining of the chromosomes. This process can result in an exchange 
of alleles between chromosomes. (from Human Genome Project 
Information) 


CROSSLINKING 
The linking of two strands of DNA by covalent bonds (as opposed to 
the normal hydrogen bonds between base pairs ), which can occur by 
exposure to X-rays. (from BioTech Dictionary) 


CYS (CYSTEINE)(C) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary), 


CYTOPLASM 
All the contents of a cell, including the plasma membrane, but not 
including the nucleus. (from UCMP Glossary) 


CYTOSINE 
One of the four bases in DNA that make up the letters ATGC, cytosine 
is the "C". The others are adenine, guanine, and thymine. Cytosine 
always pairs with guanine. (from National Human Genome Research 
Institute) 


DNA 
The chemical inside the nucleus of a cell that carries the genetic 
instructions for making living organisms. (from National Human 
Genome Research Institute) 


DNA MICROARRAY (DNA CHIP) 
a piece of glass or plastic on which single-stranded pieces of DNA 
have been affixed in a microscopic array. (from Wikipedia) 


ENHANCER 
a short region of DNA which can be bound with proteins (namely, the 
trans-acting factors, much like a set of transcription factors) to enhance 
transcription levels of nearby genes (hence the name) in a gene-cluster. 
(from Wikipedia) 


EXON 
The region of a gene that contains the code for producing the gene's 
protein. Each exon codes for a specific portion of the complete protein. 
In some species (including humans), a gene's exons are separated by 
long regions of DNA (called introns or sometimes "junk DNA") that 
have no apparent function. (from National Human Genome Research 
Institute) 


EXPRESSION (GENE EXPRESSION) 
The process by which a gene's coded information is converted into the 
structures present and operating in the cell. Expressed genes include 
those that are transcribed into mRNA and then translated into protein 


and those that are transcribed into RNA but not translated into protein. 
(from BioTech Dictionary) 


GAMETE 
Mature male or female reproductive cell (sperm or ovum) with a 
haploid set of chromosomes (23 for humans). (from Human Genome 
Project Information) 


GENE 
The functional and physical unit of heredity passed from parent to 
offspring. Genes are pieces of DNA, and most genes contain the 
information for making a specific protein. (from National Human 
Genome Research Institute) 


GENOME 
All the DNA contained in an organism or a cell, which includes both 
the chromosomes within the nucleus and the DNA in mitochondria. 
(from National Human Genome Research Institute) 


GENETIC MAP (LINKAGE MAP) 
a chromosome map of a species that shows the position of its known 
genes and/or markers relative to each other, rather than as specific 
physical points on each chromosome. (from National Human Genome 
Research Institute) 


GLN (GLUTAMINE)(Q) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


GLU (GLUTAMIC ACID)(E) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


GLX (GLUTAMINE OR GLUTAMIC ACID)(Z) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary). 


GLY (GLYCINE)(G) 


One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


GUANINE 
One of the four bases in DNA that make up the letters ATGC, guanine 
is the "G". The others are adenine, cytosine, and thymine. Guanine 
always pairs with cytosine. (from National Human Genome Research 
Institute) 


HIS (HISTIDINE)(H) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


HYBRIDIZATION 
A genetics lab technique used to identify which colonies of bacteria on 
a plate contain a particular sequence of DNA or a particular gene. The 
technique involves pressing a nylon or nitrocellulose membrane onto 
the plate so that each colony contributes a small smudge of itself to the 
membrane, then treating the membrane with chemicals and heat, then 
washing the membrane with a labeled probe to find the specific DNA 
sequence. The smudges which are indicated by the probe are then 
compared back to the colonies on the plate. (from BioTech Dictionary) 


ILE (ISOLEUCINE)(I) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


INTRON 
A noncoding sequence of DNA that is initially copied into RNA but is 
cut out of the final RNA transcript. (from National Human Genome 
Research Institute) 


IN SITU HYBRIDIZATION 
The base pairing of a sequence of DNA to metaphase chromosomes on 
a microscope slide. (from National Human Genome Research Institute) 


LEU (LEUCINE)(L) 


One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


LYS (LYSINE)(K) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


MEIOSIS 
The process of two consecutive cell divisions in the diploid 
progenitors of sex cells. Meiosis results in four rather than two 
daughter cells, each with a haploid set of chromosomes. (from Human 
Genome Project Information) 


MET (METHIONINE)(M) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


METAPHASE 
The phase of mitosis, or cell division, when the chromosomes align 
along the center of the cell. Because metaphase chromosomes are 
highly condensed, scientists use these chromosomes for gene mapping 
and identifying chromosomal aberrations. (from National Human 
Genome Research Institute) 


MITOSIS 
The process of nuclear division in cells that produces daughter cells 
that are genetically identical to each other and to the parent cell. (from 
Human Genome Project Information) 


MRNA 
Template for protein synthesis. Each set of three bases, called codons, 
specifies a certain protein in the sequence of amino acids that comprise 
the protein. The sequence of a strand of mRNA is based on the 
sequence of a complementary strand of DNA. (from National Human 
Genome Research Institute) 


NORTHERN BLOT 


A technique used to identify and locate mRNA sequences that are 
complementary to a piece of DNA called a probe. (from National 
Human Genome Research Institute). 


NUCLEOTIDE 
One of the structural components, or building blocks, of DNA and 
RNA. A nucleotide consists of a base (one of four chemicals: adenine, 
thymine, guanine, and cytosine) plus a molecule of sugar and one of 
phosphoric acid. (from National Human Genome Research Institute) 


NUCLEUS 
The central cell structure that houses the chromosomes. (from National 
Human Genome Research Institute) 


OLIGO 
Oligonucleotide, short sequence of single-stranded DNA or RNA. 
Oligos are often used as probes for detecting complementary DNA or 
RNA because they bind readily to their complements. (from National 
Human Genome Research Institute) 


PHE (PHENYLALANINE)(F) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary) 


POLYMER 
A polymer is formed from the fusion of two monomers which join 
completely without losing any small molecules. (from BioTech 
Dictionary). 


POLYPEPTIDE 
A protein or part of a protein made of a chain of amino acids joined by 
a peptide bond. (from Human Genome Project Information) 


PRO (PROLINE)(P) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary). 


PROMOTER 


a DNA sequence that enables a gene to be transcribed. The promoter is 
recognized by RNA polymerase, which then initiates transcription. 
(from Wikipedia). 


PROPHASE 
the first stage of meiosis 
the first stage of mitosis(from WordNet) 


PROTEIN 
A large complex molecule made up of one or more chains of amino 
acids. Proteins perform a wide variety of activities in the cell. (from 
National Human Genome Research Institute) 


RECOMBINATION 
The process by which progeny derive a combination of genes different 
from that of either parent. In higher organisms, this can occur by 
crossing over. (from Human Genome Project Information) 


REPLICATION 
The process by which DNA copies itself before cell division. Unless 
mutation occurs, the new copy of DNA is identical to the original 
DNA. (from HOPES) 


RIBOSOME 
Cellular organelle that is the site of protein synthesis (from National 
Human Genome Research Institute) 


RNA 
A chemical similar to a single strand of DNA. In RNA, the letter U, 
which stands for uracil, is substituted for ‘T in the genetic code. RNA 
delivers DNA's genetic message to the cytoplasm of a cell where 
proteins are made. (from National Human Genome Research Institute) 
[link] 


SER (SERINE)(S) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary). 


SOUTHERN BLOT 
A technique used to identify and locate DNA sequences which are 
complementary to another piece of DNA called a probe. (from 
National Human Genome Research Institute). 


SPLICING 
The joining of separate strands of DNA or RNA. (from Wikipedia). 


TELOMERE 
The end of a chromosome. This specialized structure is involved in the 
replication and stability of linear DNA molecules. (from Human 
Genome Project Information) 


TELOPHASE 
the final stage of meiosis when the chromosomes move toward 
opposite ends of the nuclear spindle 
the final stage of mitosis(from WordNet) 


THR (THREONINE)(T) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary). 


THY MINE 
One of the four bases in DNA that make up the letters ATGC, thymine 
is the "T". The others are adenine, guanine, and cytosine. Thymine 
always pairs with adenine. (from National Human Genome Research 
Institute) 


TRANSCRIPTION 
the organic process whereby the DNA sequence in a gene is copied 
into mRNA; the process whereby a base sequence of messenger RNA 
is synthesized on a template of complementary DNA(from WordNet) 


TRANSCRIPTION FACTOR 
a protein that binds DNA at a specific promoter or enhancer region or 
site, where it regulates transcription, Transcription factors can be 
selectively activated or deactivated by other proteins, often as the final 
step in signal transduction. (from Wikipedia). 


TRANSLATION 
the process whereby genetic information coded in messenger RNA 
directs the formation of a specific protein at a ribosome in the 


TRNA 
A class of RNA having structures with triplet nucleotide sequences 
that are complementary to the triplet nucleotide coding sequences of 
mRNA. The role of tRNAs in protein synthesis is to bond with amino 
acids and transfer them to the ribosomes, where proteins are assembled 
according to the genetic code carried by mRNA, (from Human 
Genome Project Information) 


TRP TRYPTOPHAN)(W) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary). 


TYR (TYROSINE)(Y) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary). 


URACIL 
One of the four bases in RNA. The others are adenine, guanine, and 
cytosine. Uracil replaces thymine, which is the fourth base in DNA. 
Like thymine, uracil always pairs with adenine. (from National Human 
Genome Research Institute) 


VAL (VALINE)(V) 
One of the twenty naturally occurring amino acids. (from BioTech 
Dictionary). 


